Bridging the Gap between Dictionary and Thesaurus

نویسنده

  • Oi Yee Kwong
چکیده

This paper presents an algorithm to integrate different lexical resources, through which we hope to overcome the individual inadequacy of the resources, and thus obtain some enriched lexical semantic information for applications such as word sense disambiguation. We used WordNet as a mediator between a conventional dictionary and a thesaurus. Preliminary results support our hypothesised structural relationship, which enables the integration, of the resources. These results also suggest that we can con> bine the resources to achieve an overall balanced degree of sense discrimination. 1 I n t r o d u c t i o n It is generally accepted that applications such as word sense disambiguation (WSD), machine translation (MT) and information retrieval (Ill.), require a wide range of resources to supply the necessary lexical semantic information. For instance, Calzolari (1988) proposed a lexieal database in Italian which has the features of both a dictionary and a thesaurus; and Klavans and Tzoukermann (1995) tried to build a fuller bilingual lexicon by enhancing machine-readable dictionaries with large corpora. Among the a t t empts to enrich lexical information, many have been directed to the analysis of dictionary definitions and the t ransformation of the implicit information to explicit knowledge bases for computat ional purposes (Amsler, 1981; Calzolari, 1984; Chodorow et al., 1985; Markowitz et al., 1986; Klavans et al., 1990; Vossen and Copestake, 1993). Nonethdess, dictionaries are also infamous of their non-standardised sense granularity, and the taxonomies obtained from definitions are inevitably ad hoe. It would therefore be a good idea if we can unify our lexical semantic knowledge by some existing, and widely exploited, classifications such as the system in Roget 's Thesaurus (Roget, 1852), which has remained intact for years and has been used in WSD (Yarowsky, 1992). While the objective is to integrate ditferent lexical resources, the problem is: how do we recoi> cile the rich but variable information in dictionary senses with the cruder but more stable taxonomies like those in thesauri7 This work is intended to fill this gap. We use WordNet as a mediator in the process, in the following, we will outline an algorithm to map word senses in a dictionary to semantic classes in some established classitication scheme. 2 I n t e r r e l a t e d n e s s o f t h e R e s o u r c e s Tlle three lexical resources used in this work are the 1987 revision of Roget 's Thesaurus (ROGET) (Kirkpatrick, 1987), the Longman Dictionary of Contemporary English (I ,DOCE) (Procter, 1978) and WordNet 1.5 (WN) (Miller et al., 1993). Figure 1 shows how word senses are organised in them. As we have mentioned, instead of directly mapping an I ,DOCE definition to a R O G E T class, we bridge the gap with WN, as indicated by the arrows in t;|Ie figure. Such a route is made feasible by linking the structures in common among the resources. Words are organised in alphabetical order in LDOCE, as in other conw;ntioual dictionaries. The senses are listed after each entry, in the form of text definitions. WN groups words into sets of synonyms (%ynsets"), with an optional textual gloss. These synsets form the nodes of a taxonomic hierarchy. in I1.OGET, each semantic class comes with a nuinber, under which words are first assorted by part of speech and then grouped into paragraphs according to the conw.'yed idea. Let us refer to Figure 1 and s tar t from word x2 in WN synset X. Since words expressing every aspect of an idea are grouped together in II.()GET, we can therefore expect to find not only words in synset X, but also those in the coordinate WN synsets (i.e. M and P, with words ml , m2, pl , P2, etc.) and the superordinate WN synsets (i.e. C and A, with words cj, c2, etc.) in the same R O G E T paragraph. In other words, the thesaurus class to which x2 belongs should include roughly X U M U I ' U C U A. Meanwhile, the LDOCE definition corresponding to the sense of synset X (denoted by Dx) is expected to be sinfilar to the textual gloss of synset X (denoted by GI(X)). In addition, given that it is not unusual for

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building A Large Thesaurus For Information Retrieval

Information retrieval systems that support searching of large textual databases are typically accessed by trained search intermediaries who provide assistance to end users in bridging the gap between the languages of authors and inquirers. We are building a thesaurus in the form of a large semantic network .to support interactive query expansion and search by end users. Our lexicon is being bui...

متن کامل

Causes of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It

Causes of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It   M.A. Jamaalifar* S. Sh. HaashemiMoghadam, Ph.D.** Z. Aabedi Karajibaan, Ph.D.*** A.R. Faghihi, Ph.D.****   To identify the causes of the perceived gap between junior high school intended, implemented, and attained curricula, a group of 30 curriculum planners, 50 educationa...

متن کامل

Bridging the Gap Between Research and Policy and Practice; Comment on “CIHR Health System Impact Fellows: Reflections on ‘Driving Change’ Within the Health System”

Far too often, there is a gap between research and policy and practice. Too much research is undertaken with little relevance to real life problems or its reported in ways that are obscure and impenetrable. At the same time, many policies are developed and implemented but are untouched by, or even contrary to evidence. An accompanying paper describes an innovative progr...

متن کامل

Cross border E-Science and Research Partnership: Bridging the Gap Between Science and Media

  E-Science is a tool that helps scientists to store, interpret, analyze and make a network of their data, and it can play a critical role in different aspects of the scientific goals and research. This commentary, under the topic of Cross Border E-Science and Research Partnership: Bridging the Gap between Science and Media,[1] attempts to shed light on E-Science with emphasis on three importa...

متن کامل

-

The development and evolution of any system–person, organization–nation depends on how the system succeeds to bridge the gap between what the system knows and what the system does (with the knowledge). We call this the gap between knowing and doing or the knowing-doing gap. If the system does not do what it knows, it will lose out in competition with other systems, its relative performance in...

متن کامل

Invited Contribution State of the Art on Swimming Physiology and Coaching Practice: Bridging the Gap between Theory and Practice

The aim of the present paper was to survey the state of the art on swimming physiology as related to coaching practice in order to help bridging the gap between theory and practice. Systematic literature searches were performed through the years 1990 – 2006 utilising EBSCOhost Research Databases and SportDiscus. Ovid Medline was used to scan materials for randomized controlled trials. The searc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998